Fuzzy clustering of categorical data using fuzzy centroids

نویسندگان

  • Dae-Won Kim
  • Kwang Hyung Lee
  • Doheon Lee
چکیده

In this paper the conventional fuzzy k-modes algorithm for clustering categorical data is extended by representing the clusters of categorical data with fuzzy centroids instead of the hard-type centroids used in the original algorithm. Use of fuzzy centroids makes it possible to fully exploit the power of fuzzy sets in representing the uncertainty in the classification of categorical data. To test the proposed approach, the proposed algorithm and two conventional algorithms (the k-modes and fuzzy k-modes algorithms) were used to cluster three categorical data sets. The proposed method was found to give markedly better clustering results. 2004 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A fuzzy k-partitions model for categorical data and its comparison to the GoM model

The grade of membership (GoM) model uses fuzzy sets as memberships of each individual to extreme profiles (or classes) on the likelihood function of multivariate multinomial distributions. The GoM clustering algorithm derived from the GoM model is used in cluster analysis for categorical data, but it is iterated with complicated calculations. In this paper we create another approach, termed a f...

متن کامل

Hierarchical clustering algorithm for categorical data using a probabilistic rough set model

Several clustering analysis techniques for categorical data exist to divide similar objects into groups. Some are able to handle uncertainty in the clustering process, whereas others have stability issues. In this paper, we propose a new technique called TMDP (Total Mean Distribution Precision) for selecting the partitioning attribute based on probabilistic rough set theory. On the basis of thi...

متن کامل

Modified Particle Swarm Optimization Based Adaptive Fuzzy K-Modes Clustering for Heterogeneous Medical Databases

The main purpose of data mining is to extract hidden predictive knowledge of useful information and patterns of data from large databases for utilizing it in decision support. Medical field has large amount of various heterogeneous databases, in which the extraction of hidden useful knowledge for the classification of data is difficult one. In order to cluster and classify the whole databases o...

متن کامل

A fuzzy k-modes algorithm for clustering categorical data

This correspondence describes extensions to the fuzzy k-means algorithm for clustering categorical data. By using a simple matching dissimilarity measure for categorical objects and modes instead of means for clusters, a new approach is developed, which allows the use of the k-means paradigm to efficiently cluster large categorical data sets. A fuzzy k-modes algorithm is presented and the effec...

متن کامل

A New Kernelized Fuzzy C-Means Clustering Algorithm with Enhanced Performance

Recently Kernelized Fuzzy C-Means clustering technique where a kernel-induced distance function is used as a similarity measure instead of a Euclidean distance which is used in the conventional Fuzzy C-Means clustering technique, has earned popularity among research community. Like the conventional Fuzzy C-Means clustering technique this technique also suffers from inconsistency in its performa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 25  شماره 

صفحات  -

تاریخ انتشار 2004